Conference Proceedings

Beyond Perception: Evaluating Abstract Visual Reasoning through Multi-Stage Task

Y Jiang, Y Ding, C Lei, J Ao, JH Lau, KA Ehinger

Proceedings of the Annual Meeting of the Association for Computational Linguistics | Association for Computational Linguistics | Published : 2025

Abstract

Current Multimodal Large Language Models (MLLMs) excel in general visual reasoning but remain underexplored in Abstract Visual Reasoning (AVR), which demands higher-order reasoning to identify abstract rules beyond simple perception. Existing AVR benchmarks focus on single-step reasoning, emphasizing the end result but neglecting the multi-stage nature of reasoning process. Past studies found MLLMs struggle with these benchmarks, but it doesn't explain how they fail. To address this gap, we introduce MultiStAR, a MultiStage AVR benchmark based on RAVEN, to assess reasoning across varying levels of complexity. Additionally, existing metrics like accuracy only focus on the final outcomes while..

View full abstract

University of Melbourne Researchers

Grants

Awarded by Australian Research Council


Citation metrics